Skip to main content

Accessible PDFs and PDF archiving standards

PDF is a flexible format, and using PDF in certain contexts requires additional conventions. For example, PDFs are not accessible by default; they define how characters are placed on a page but do not contain semantic information on the content. However, it is possible to generate accessible PDFs, which use tagging to add semantic information to the document.

Pandoc defaults to LaTeX to generate PDF. Tagging support in LaTeX is in development and not readily available, so PDFs generated in this way will always be untagged and not accessible. This means that alternative engines must be used to generate accessible PDFs.

The PDF standards PDF/A and PDF/UA define further restrictions intended to optimize PDFs for archiving and accessibility. Tagging is commonly used in combination with these standards to ensure best results.

Note, however, that standard compliance depends on many things, including the colorspace of embedded images. Pandoc cannot check this, and external programs must be used to ensure that generated PDFs are in compliance.

ConTeXt

ConTeXt always produces tagged PDFs, but the quality depends on the input. The default ConTeXt markup generated by pandoc is optimized for readability and reuse, not tagging. Enable the tagging format extension to force markup that is optimized for tagging. This can be combined with the pdfa variable to generate standard-compliant PDFs. E.g.:

pandoc --to=context+tagging -V pdfa=3a

A recent context version should be used, as older versions contained a bug that lead to invalid PDF metadata.

WeasyPrint

The HTML-based engine WeasyPrint includes experimental support for PDF/A and PDF/UA since version 57. Tagged PDFs can created with

pandoc --pdf-engine=weasyprint \
--pdf-engine-opt=--pdf-variant=pdf/ua-1 ...

The feature is experimental and standard compliance should not be assumed.

Prince XML

The non-free HTML-to-PDf converter prince has extensive support for various PDF standards as well as tagging. E.g.:

pandoc --pdf-engine=prince \
--pdf-engine-opt=--tagged-pdf ...

See the prince documentation for more info.

Typst

Typst 0.12 can produce PDF/A-2b:

pandoc --pdf-engine=typst --pdf-engine-opt=--pdf-standard=a-2b ...

Word Processors

Word processors like LibreOffice and MS Word can also be used to generate standardized and tagged PDF output. Pandoc does not support direct conversions via these tools. However, pandoc can convert a document to a docx or odt file, which can then be opened and converted to PDF with the respective word processor. See the documentation for Word and LibreOffice.